Spectro-temporal directional derivative features for automatic speech recognition
نویسندگان
چکیده
We introduce a novel spectro-temporal representation of speech by applying directional derivative filters to the Melspectrogram, with the aim of improving the robustness of automatic speech recognition. Previous studies have shown that two-dimensional wavelet functions, when tuned to appropriate spectral scales and temporal rates, are able to accurately capture the acoustic modulations of speech, even in high noise conditions. Therefore, spectro-temporal features extracted from the wavelet transformation of the spectrogram, offer additional noise robustness to important signal processing tasks, such as voice activity detection and speech recognition. In this paper, we explore the use of the steerable pyramid, a directional wavelet transform that is common in image processing, to derive a spectro-temporal feature representation of speech that can serve as an alternative to cepstral derivatives and Gabor filterbank features. We discuss their application for the task of robust automatic speech recognition. Experiments conducted on the Aurora-2 database demonstrate their competitive robustness to other state-of-the-art speech features, especially in low signalto-noise ratio conditions.
منابع مشابه
Phoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain
This article presents a new feature extraction technique based on the temporal tracking of clusters in spectro-temporal features space. In the proposed method, auditory cortical outputs were clustered. The attributes of speech clusters were extracted as secondary features. However, the shape and position of speech clusters change during the time. The clusters temporally tracked and temporal tra...
متن کاملSpectro-temporal Gabor features as a front end for automatic speech recognition
A novel type of feature extraction is introduced to be used as a front end for automatic speech recognition (ASR). Two-dimensional Gabor filter functions are applied to a spectro-temporal representation formed by columns of primary feature vectors. The filter shape is motivated by recent findings in neurophysiology and psychoacoustics which revealed sensitivity towards complex spectro-temporal ...
متن کاملMethods for capturing spectro-temporal modulations in automatic speech recognition
Psychoacoustical and neurophysiological results indicate that spectro-temporal modulations play an important role in sound perception. Speech signals, in particular, exhibit distinct spectro-temporal patterns which are well matched by receptive fields of cortical neurons. In order to improve the performance of automatic speech recognition (ASR) systems a number of different approaches are prese...
متن کاملMulti-stream to many-stream: using spectro-temporal features for ASR
We report progress in the use of multi-stream spectro-temporal features for both small and large vocabulary automatic speech recognition tasks. Features are divided into multiple streams for parallel processing and dynamic utilization in this approach. For small vocabulary speech recognition experiments, the incorporation of up to 28 dynamically-weighted spectro-temporal feature streams along w...
متن کامل